| Plot | Value |
|---|---|
| Plot 1: Sorted Pie Chart | 0.182 |
| Plot 2: Unsorted Pie Chart | 0.379 |
| Plot 3: Filled Bar Chart | 0.478 |
| Plot 4: Bar Chart | 0.212 |
| Plot 5: Unsorted Scatterplot | 0.067 |
| Plot 6: Sorted Scatterplot | 0.350 |
| Plot 7: Colors | 0.273 |
| Plot 8: Area | 0.415 |
PH345: Winter 2026
Which color appears most often?
Which color appears most often?
Aesthetics or encodings are ways that we map data to visual properties of the plot and include position, color, length, shape, area, volume
Choice of aesthetics helps or hinders your audience’s understanding of what the data are showing
One proportion for each of five groups (A-E)
Guess group B’s numerical value as a proportion, e.g. 0.98.
Your answer would probably be close to 0.28
For each of the next 8 plots, guess group B’s numerical value based on the plot. Enter your guesses on this google form:
| Plot | Value |
|---|---|
| Plot 1: Sorted Pie Chart | 0.182 |
| Plot 2: Unsorted Pie Chart | 0.379 |
| Plot 3: Filled Bar Chart | 0.478 |
| Plot 4: Bar Chart | 0.212 |
| Plot 5: Unsorted Scatterplot | 0.067 |
| Plot 6: Sorted Scatterplot | 0.350 |
| Plot 7: Colors | 0.273 |
| Plot 8: Area | 0.415 |
| Plot | Easiest | Most Difficult |
|---|---|---|
| Plot 1: Sorted Pie Chart | 1 | 1 |
| Plot 2: Unsorted Pie Chart | 1 | 1 |
| Plot 3: Filled Bar Chart | 0 | 5 |
| Plot 4: Bar Chart | 14 | 0 |
| Plot 5: Unsorted Scatterplot | 0 | 0 |
| Plot 6: Sorted Scatterplot | 5 | 0 |
| Plot 7: Colors | 0 | 10 |
| Plot 8: Area | 0 | 4 |
Take away: some aesthetics communicate data better than others
Figure 14 from Mackinlay (1986)
Vertex Fellow, Data Visualization at Vertex Pharmaceuticals.
Formerly Creative Director of the Broad Institute of MIT and adjunct assistant professor in the Department of Art as Applied to Medicine at Hopkins
Published monthly column on data visualization in Nature Methods journal from 2010-2012
Different visual variables encoding the same five values.
Figure 1c from Wong (2010a)
What is the rate of change of atmospheric CO2 over time?
Figure 6 from Cleveland and McGill (1985)
What is the relative size of big vs small circle?
14x
How does distance between lines vary?
it’s constant
Figure 1c from Wong (2010a)
Figure 15 from Mackinlay (1986)
Point out the swap between Quant and Ord in (length, angle, slope, area, volume) and (density, color saturation, color hue, texture, connection, containment), and then (hue, texture, connection, and containment) further improve between ord and nom.
Lines in graphs create clear connection. Enclosure is an effective way to draw attention to a group of objects.
Figure 2b from Wong (2010b)
Example of shape as grouping for nominal data, but connection (on the rhs) provides even better grouping. but enclosure can counteract this connection, if needed
What regions of the US experienced greatest population growth?
Figure 4.2, Wilke (2019)
Key point is that color is much more effective when used to group observations rather than for numbers
How do Malawi’s teachers positive teaching practices compare to those of other Sub-Saharan African countries?
Figure 3.2, Asim (2024)
Another example of effective use of color – here a distinctive color is used to frame the country we are interested in, Malawi, and the other countries are also colored according to a blue scheme. On the other hand, it’s tempting but wrong to draw meaning about the different shades of blue.
How do entrance and pass rates for Primary School Leaving Certificate Examinations (PSLCE) compare between boys and girls in Malawi?
Figure 5.2, Asim (2024)
How do entrance and pass rates for Primary School Leaving Certificate Examinations (PSLCE) compare between boys and girls in Malawi?
Figure 5.2, Asim (2024)
Ultimately, girls are 6 percent less likely than boys to enter the Primary School Leaving Certificate Examinations (PSLCE) and 13 percent less likely than boys to pass (refer to figure 5.2, panel b).
The 13% statistic requires comparing the relative height of the striped bar for boys against the relative height of the striped bar for girls, which is challenging without the annotated values.
Toward Safer and More Productive Migration for South Asia
Number of deployments is calculated as average for Bangladeshi, Indian, Nepali, Pakistani, and Sri Lankan labor migrants in their respective top five destination countries… remittances are defined as total amount of remittances that flow into Bangladesh, India, Nepal, and Pakistan.
Figure 3.4, Ahmed (2022)
Questions:
ggplot(migration) +
geom_line(aes(x = year, y = pct_change_diff, color = variable), linewidth = 1) +
# you could stop here for the no-spice option (make sure to delete the '+' above)
# otherwise the rest is of the code is how you get the yoga flame option
geom_hline(yintercept = 0, linetype = "dashed") +
geom_vline(xintercept = 2008, linetype = "dashed") +
scale_x_continuous(breaks = 2007:2015, minor_breaks = NULL, name = NULL) +
scale_y_continuous(name = "Annual growth rate (%)", breaks = seq(-100, 100, by = 20)) +
scale_color_manual(values = c("#ED6B36", "#78ACD9"), name = NULL, labels = c("Number of deployments (from sending)", "Remittances (into sending)")) +
theme(legend.position = "bottom",
axis.line = element_line(),
panel.background = element_blank())Plotting lines emphasizes change between points: the change in the annual growth rate. How easy is this to interpret?
Removing the lines makes the differences less dramatic (probably a good thing)
migration <-
migration %>%
# 'pct_change_mult' is the multiplier for the change in the number of deployments or remittances, e.g. 10% increase = 1.10 multiplier, 10% decrease = 0.90 multiplier
mutate(pct_change_mult = 1 + pct_change_diff / 100) %>%
# the next mutate will be year-by-year, so we need to make sure this
# is separate for each variable (deployments or remittances), i.e. with group_by()
group_by(variable) %>%
# cumulative product of the multipliers gives us the multiplier for the change from 2006, e.g. if we have a 10% increase in year 1 and a 20% decrease in year 2, then the multiplier for year 2 is (1.10 * 0.80) = 0.88, which means an overall decrease of 12% from year 0 (2006)
mutate(pct_change_mult_2006 = 100 * cumprod(pct_change_mult) - 100) %>%
ungroup()
ggplot(migration) +
geom_line(aes(x = year, y = pct_change_mult_2006, color = variable), linewidth = 1) +
# stop here for the medium spice option (make sure to delete the '+' above), otherwise
# the rest of the code is how you get the dim mak option
geom_hline(yintercept = 0, linetype = "dashed") +
geom_vline(xintercept = 2008, linetype = "dashed") +
scale_x_continuous(breaks = 2007:2015, minor_breaks = NULL, name = NULL) +
scale_y_continuous(name = "% Change from 2006", breaks = seq(-100, 100, by = 20)) +
scale_color_manual(values = c("#ED6B36", "#78ACD9"), name = NULL, labels = c("Number of deployments (from sending)", "Remittances (into sending)")) +
theme(legend.position = "bottom",
axis.line = element_line(),
panel.background = element_blank())No Spice: Make an approximate version of my recreation of Figure 3.4 on slide 34: focus just on the structure
Weak Sauce: No menu options today…
Medium Spice: Make an approximate version of my ‘% Change from 2006’ plot on slide 36: focus just on the structure
Yoga Flame: Make an exact replicate of my recreation of Figure 3.4 on slide 34. I’m looking for perfection!
Dim Mak: Make an exact replicate of my ‘% Change from 2006’ plot on slide 36. I’m looking for perfection!
Ahmed, S.A. and Bossavie, L. eds., 2022. Toward Safer and More Productive Migration for South Asia. World Bank Publications. website
Asim, S. and Gera, R.C., 2024. What Matters for Learning in Malawi? Evidence from the Malawi Longitudinal School Survey. World Bank Publications-Books. website
Cleveland, W.S. and McGill, R., 1985. Graphical perception and graphical methods for analyzing scientific data. Science, 229(4716), pp.828-833.
Mackinlay, J., 1986. Automating the design of graphical presentations of relational information. Acm Transactions On Graphics (Tog), 5(2), pp.110-141.
Wehrli, U., 2003. Tidying Up Art. Prestel Publishing.
Wilke, C.O., 2019. Fundamentals of data visualization: a primer on making informative and compelling figures. O’Reilly Media.
Wong, B., 2010a. Design of data figures. Nature Methods, 7(9), pp.665-666.
Wong, B., 2010b. Points of view: Gestalt principles (Part 1). Nature Methods, 7(11), p.863.